Unveiling the impact of population density in SDM:
Response curve and variable importance could deliver misleading inference
Jeon, Cheongok1, Kim, Daehyun2
1The Institute for Korean Regional Studies, Seoul National University, Seoul 08826; LophitaL@snu.ac.kr
2Department of Geography, Seoul National University, Seoul 08826; biogeokim@snu.ac.kr
Estimating how species react to their environments
Based on correlation between species occurrence and environment
https://damariszurell.github.io/SDM-Intro/
Species Distribution Model(SDM)?
Model habitat suitability or occurrence probability
Fitting better model =
Searching for ”best” line against certain environmental variable
Virtual species simulation =
researcher draw own line and check if SDM can estimate it
SDM!
https://damariszurell.github.io/EEC-MGC/index.html
Assumption of SDM
Species Distribution Model(SDM)
Virtual species simulation
Response curve in SDM,
Response function in simulation
Response curve is constant over the species
Assumption of SDM
Response curve could depend
on local population density
# of population density affect energy savings
Saved energy could be used for other fitness, such as reproduction rate
Gilbert, et al., J Exp Biol (2008)
Response curve may not constant over the species
© David Stanley © Waranont (Joe)
© Kris-Mikael Krister
How species react to their environments could
depends on their population density
Social
thermoregulation
Antipredator
vigilance
Resistance to
extreme weather event
Allee effect could be accounted
to estimate fitness/habitat suitability
Allee effect: positive relationship between individual fitness and numbers/density of conspecifics
More conspecifics, higher survival/reproduction rate
Less conspecifics, low survival/reproduction rate
Habitat suitability could be correlated with # of conspeicifics, population density
Stephens, et al. 1999
Individual fitness
Abundance
No Allee effect Strong Allee effect
Knowledge gap
Population density could indirectly affect habitat suitability by Allee effect
e.g., social thermoregulation(abiotic), antipredator vigilance(biotic)
Needs for how population density could affect inference of SDM
Many focus on prediction error, not interpretation error
Research question
How to account Allee effect on SDM?
How they comes out from different SDM algorithm and interpretation methods?
Methods
Virtual species simulation
Equation solving with Monte Carlo method
SDM with various algorithms(presence-background, presence-absence)
Model interpretation(response curve, variable importance)
For response curve-SHAP(SHapley Additive exPlanations), PDP(Partial Dependency Plot)
For variable importance-mean of Shapley value, permutation variable importance
Simulation settings
Where 𝑥!⋅⋅⋅ 𝑥" 𝑋"
𝑌
#$ =1
σ! %&
𝑒𝑥𝑝 −1
2
𝑥! µ!
σ!
%
+⋅⋅⋅ + 1
σ" %&
𝑒𝑥𝑝 −1
2
𝑥" µ"
σ"
%
Habitat suitability(𝑌
#$) is determined by environmental varible(𝑋")
with Gaussian curve
σ"
'= 𝑎()(."
'𝑋()( + σ"
''
Standard deviation(SD) affected with Allee effect(σ"
')could be linear equation of
population density(𝑋()(), Allee effect coefficient(𝑎()(."
') and SD for isolated situation(σ"
'')
Assume population density is equal to habitat suitability (𝑋()( = 𝑌
#$)
𝑋()( =1
𝑎()(⋅!
'𝑋()( + σ!
'' exp 1
2
𝑥! µ!
𝑎()(⋅!
'𝑋()( + σ!
''
%
+⋅⋅⋅ + 1
σ"%&
exp 1
2
𝑥" µ"
σ"
%
Results
Presenting noble method to account Allee effect
"Bullet-shape" curve from positive Allee effect coefficient(𝑎()(."
')
"Triangle-shape" curve from negative Allee effect coefficient(𝑎()(."
')
Results: response curve
For response curve, flexible presence-absence algorithms
performed better than presence-background algorithms
BRT, GAM, was effective, followed by MaxEnt and RFds
Generalized Linear Model(GLM) showed the poorest performance
MaxEnt showed highest variance from Allee effect coefficient(a!"!.$
%)
SHAP showed slightly more accurate interpretation than PDP
Species with population density effected have higher variance on
RMSE than not effected species
Results: variable importance
For variable importance, most of model effectively estimated
variable importance when population density is not depended
except GLM.
However, most of model couldn’t estimate variable importance
when they comes to population density effected species
This limitation was more serious on SHAP
Absolute mean of Shapley value was not effective
to capture variable importance under population density
depended response curve
BRT and GAM was effective on estimating response curve but
not on variable importance
Discussion
The main reason of performance degradation could be information limitation
Allee effect could cause spatial clustering of presence-absence data, which could limit the
environmental information with positive spatial autocorrelation of environmental variables.
Presence
Presence
Abs. Abs.
Abs. Abs.
Environmental variable
Population density
Discussion
Our method could be more useful to model ecosystem engineer species and
positive feedback of habitat suitability
Accounting population density could alter our understanding about habitats
Population density
Habitat suitability
Abiotic
In case of Ecosystem engineer
Increase in habitat modification
Increase habitat suitability
Increase in population
Discussion
Spatial cluster of presence dataset is often called “bias” but if this implicit density, affecting
their habitat suitability, “bias removed” model interpretation could be far from truth
Prediction performance are not always equal to interpretation performance,
needs for explanatory modeling analysis
Valavi, et al. 2021
Fourcade, et al. 2021
Conclusion
Population density depended response curve could hinder proper model interpretation
GAM could be most robust algorithm for non-linear relationship and
population density depended response curve
Certain methods could be less accurate to interpret species response,
(Especially, SHAP for variable importance)
Presence-background methods could be alternative,
but interpretation form various approach is recommended
Current SDM methods could deliver misleading inference
with species that had strong Allee effect
Allee effect and its effect on habitat suitability needs more attention
Thank you!
Jeon, Cheongok1and Kim, Daehyun2
1The Institute for Korean Regional Studies, Seoul National University, Seoul 08826; LophitaL@snu.ac.kr
2Department of Geography, Seoul National University, Seoul 08826; biogeokim@snu.ac.kr
@biogeojeon
cheongokjeon@gmail.com
lophital
Reference
1. Stephens, P. A., Sutherland, W. J., & Freckleton, R. P. (1999). What Is the Allee Effect? Oikos, 87(1), 185
2. Gilbert, C.,Blanc, S.,Le Maho, Y.,Ancel, A., 2008, Energy saving processes in huddling emperor penguins: From experiments to theory, Journal of Experimental Biology., 211,
1–8
3. Angulo, E.,Luque, G. M.,Gregory, S. D.,Wenzel, J. W.,Bessa-Gomes, C.,Berec, L.,Courchamp, F., 2018, Allee effects in social species, Journal of Animal Ecology., 87, 4758
4. Fourcade, Y., Engler, J.O., Rödder, D., & Secondi, J. (2014). Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A
Performance Assessment of Methods for Correcting Sampling Bias. PLoS ONE, 9.
5. Valavi, R., Guillera-Arroita, G., Lahoz-Monfort, J. J., & Elith, J. (2022). Predictive performance of presence-only species distribution models: a benchmark study with
reproducible code. Ecological Monographs, 92(1), e01486.
Limitation
Strong assumption about habitat suitability and population density relationship
Mobility factors should be accounted on further research, especially for simulation methods
Hyperparameter tuning could alter fitting better model but measuring residuals are limited
Spatial scale of population density effect is not considered
Residuals-response curve
For population density effected 6 environmental variable,
Red line indicate MaxEnt SHAP interpretation and
Black line for response function for virtual species
Green dots indicate presence datasets
Residuals-SHAP
Each dot indicate datasets to
measure residauls on response
curve
Y axis indicate whether model
interpretation was
over-estimate or
under-estimated